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Self-organizing maps (SOMs) are a technique that has been used 
with high-dimensional data vectors to develop an archetypal set of 
states (nodes) that span, in some sense, the high-dimensional space. 
Noteworthy applications include weather states as described by weather 
variables over a region and speech patterns as characterized by fre- 
quencies in time. The SOM approach is essentially a neural net- 
work model that implements a nonlinear projection from a high- 
dimensional input space to a low-dimensional array of neurons. In the 
process, it also becomes a clustering technique, assigning to any vec- 
tor in the high- dimensional data space the node (neuron) to which it 
is closest (using, say, Euclidean distance) in the data space. The num- 
ber of nodes is thus equal to the number of clusters. However, the pri- 
mary use for the SOM is as a representation technique, that is, find- 
ing a set of nodes which representatively span the high-dimensional 
space. These nodes are typically displayed using maps to enable visu- 
alization of the continuum of the data space. The technique does not 
appear to have been discussed in the statistics literature so it is our 
intent here to bring it to the attention of the community. The tech- 
nique is implemented algorithmically through a training set of vec- 
tors. However, through the introduction of stochasticity in the form 
of a space-time process model, we seek to illuminate and interpret 
its performance in the context of application to daily data collection. 
That is, the observed daily state vectors are viewed as a time series 
of multivariate process realizations which we try to understand under 
the dimension reduction achieved by the SOM procedure. 

The application we focus on here is to synoptic climatology where 
the goal is to develop an array of atmospheric states to capture a 
collection of distinct circulation patterns. In particular, we have daily 
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weather data observed in the form of 11 variables measured for each 
of 77 grid cells yielding an 847 x 1 vector for each day. We have such 
daily vectors for a period of 31 years (11,315 days). Twelve SOM 
nodes have been obtained by the meteorologists to represent the space 
of these data vectors. Again, we try to enhance our understanding of 
dynamic SOM node behavior arising from this dataset. 

1. Introduction. Self-organizing maps (SOMs) are a technique that has 
been used with high-dimensional data vectors to develop an archetypal set 
of states (nodes) that span, in some sense, the high-dimensional space. First 
developed by Kohonen (1995), the technique has subsequently found appli- 
cation to automatic speech recognition, analysis of electrical signals from the 
brain, data visualization and meteorology. See, for example, Ferrandez et al. 
(1997), Tamayo et al. (1999), Kaski (1997) and Crane and Hewitson (2003), 
respectively. 

The SOM approach is essentially a neural network model that imple- 
ments a nonlinear projection from a high-dimensional input space to a low- 
dimensional array of neurons. In the process, it also becomes a clustering 
technique, assigning to any vector in the high-dimensional data space the 
node/neuron (reference vector) to which it is closest (using, say, Euclidean 
distance) in the data space. The number of nodes is thus equal to the num- 
ber of clusters. However, the primary use for the SOM is as a representation 
technique, that is, finding a set of nodes which representatively span the 
high-dimensional space. These nodes are typically displayed using maps to 
enable visualization of the continuum of the data space. Hence, the approach 
should not be viewed as an "optimal" clustering technique; in particular, in 
application it is expected to produce roughly equal cluster sizes. 

A SOM algorithm is usually implemented in three stages. First, a specified 
number of nodes is selected and the values of the components for each node 
are initialized, typically selecting random values. Second, iterative training 
is performed where the nodes are adjusted in response to a set of training 
vectors so that the nodes approximately minimize an integrated distance 
criterion. The last stage of the SOM technique is visualization where each 
node's reference vector is projected in some fashion to a lower dimensional 
space and plotted as a map (perhaps several maps). Customary projection 
creates a set of neurons in two-dimensional space which arise as a deforma- 
tion of a regular lattice. For a given training set, the frequency of occurrence 
of each node can be calculated as well as the average error at each node, the 
latter interpreted as a measure of coherence around the node. With regard 
to implementation, the number of nodes is arbitrary. 

In any event, it is not our contribution here to criticize the SOM approach 
or to compare it with other clustering procedures. Rather, in practice, the 
procedure is implemented in a purely algorithmic manner, ignoring any spa- 
tial or temporal structure which may be anticipated in the training set. Our 
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contribution is to attempt to incorporate structural dependence, through 
the introduction of stochasticity in the form of a space-time process model. 
As a result, we hope to illuminate and interpret the performance of the SOM 
procedure in the context of application to daily data collection. That is, the 
observed daily state vectors are viewed as a time series of multivariate spa- 
tial process realizations. Working with the original high-dimensional data 
renders formal modeling infeasible. Instead, we try to achieve this under- 
standing through the dimension reduction achieved by the SOM procedure. 

The application we focus on here is to synoptic climatology as introduced 
by Hewitson and Crane (2002), where the goal is to develop an array of 
atmospheric states to capture a collection of distinct circulations. There 
has been some literature on estimating synoptic states with the purpose 
of downscaling climate models. For example, Hughes, Guttorp and Charles 
(1999) and Bellone, Hughes and Guttorp (2000) propose nonhomogeneous 
hidden Markov models (NHMM) which relate precipitation occurrences and 
amounts at multiple rain gauge stations to broad-scale atmospheric circula- 
tion patterns. In particular, both papers assume that occurrences and pre- 
cipitation amounts at each rain gauge are conditionally independent given 
the current synoptic weather state, that is, all the spatial dependence be- 
tween rain gauges is induced by the synoptic weather state. See, also, the 
recent work of Vrac, Stein and Hayhoe (2007) in this regard. 

In this paper we work with daily weather data observed in the form of 
11 variables measured for each of 77 grid cells yielding an 847 x 1 vector for 
each day. We have such daily vectors for a period of 31 years (11,315 days). 
Twelve SOM nodes have been obtained by the meteorologists to represent 
the space of these data vectors. Fuller detail is provided in Section 3. We 
also note that a broader view of the use for a SOM in climatology is for 
inference at longer than daily time scales. 

The format of the paper is as follows. In Section 2 we provide a brief review 
of the SOM theory and implementation. In Section 3 we detail the motivat- 
ing dataset and some exploratory data analysis. In Section 4 we present a 
collection of models to investigate. Section 5 addresses model fitting issues, 
while Section 6 considers model selection and results. Section 7 offers some 
summary discussion. 

2. A review of SOM theory and implementation. A self-organizing map 
(SOM) is a neural network model and algorithm that implements a non- 
linear projection from a high-dimensional space of input vectors to a low- 
dimensional array of neurons. That is, input vectors are assigned to nodes 
(or neurons). Nodes have two positions, one in the high-dimensional space, 
say, in a subset of M'^, one in the low-dimensional visualization space, typi- 
cally taken to be a deformation of a regular lattice in two-dimensional space. 
For a given set of nodes {wi, W2, . . . , wj\/} in the high-dimensional space, an 
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array index taking values in {j = 1, 2, . . . , M} is defined, for each x G M , as 
c(x, {wm}) = j if d(x,Wj) = mini<m<A/ Wm) for some distance d (usu- 
ally Euclidean). The theoretical objective of the SOM is to minimize, over 
aU choices of {w^, m = 1, 2, . . . , M}, ^ g{d{yL, Wc(x,{w„})))p(x) dx, where g{-) 
is a monotone function and p(x) is the density function for the random 
input vectors in M'^. Solution to this vector quantization problem is gen- 
erally intractable. We note that if we confine x to a bounded rectangu- 
lar subset of M"^ and if p(x) is assumed uniform over this subset, then, at 
the optimal {wm}, c will be equally likely to take on each of its M pos- 
sible values. Hence, with a sample of x's from this uniform distribution, 
we expect equal numbers of the x's to be assigned to each of the index 
values, that is, to each of the nodes. A special version which seeks to min- 
imize / X^m. ^(wc(x,{w„i})) Wm,)(7((i(x, Wm))p(x) dx, where /i is a smoothing 
kernel, is customarily used. It lacks a closed form solution, but an approx- 
imate solution can be obtained iteratively using stochastic approximation 
[see Kohonen et al. (1996)] as we describe below. 

We now offer a bit more detail on the nature of a SOM algorithm. In 
practice, the SOM procedure consists of three stages. Let {xj G]R'^},i = 
l,2,...,n, denote the input training vectors. In our case, d = 847 reflect- 
ing the daily 847-element climate records from 1970 to 2000. SOMs seek 
to "optimally" place a specified number of nodes, M, again denoted by 
Wm G W^,m = 1, 2, . . . ,M. In the SOMs literature (and, as the default in 
the publicly available software package cited below) the suggested number 
of nodes is 5-y/n. In our application, with n ~ 10,000, this would suggest 
roughly M = 500 nodes. However, climatologists categorize far fewer types of 
circulation patterns; for our South African data, they conclude that M = 12 
is adequate. With 500 nodes, a two-dimensional representation off'ers the 
best prospects for visualization. However, with our 12 nodes describing 11 
variables over an 11 x 7 grid, we can create more appropriate maps. For 
example, for each variable, we can provide 12 panels, each panel a map over 
the geographic space. 

We describe two versions of the iterative training algorithm procedure of 
the SOM technique as follows: 

• Initialization stage: given M, the node vectors are initialized with random 
values. 

• Iterative training (Version 1): 

- At step t, randomly choose an input vector x^*) from the training set 
{xj} for i = 1, . . . , n. 

- Compute the distance (e.g., Euclidean) between x^*) and each of the 
node vectors Wm- Identify the winning node ^c(-x.W) whose node vector 
is closest to the input vector, that is, || w^^jj.(t)^ — x^*) || < ||wm — x^*) || for 
mG {1,2,...,M}. 
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- Every node has its vector adjusted according to the fohowing equation: 

w(^+^) = wW +a(t)i^(m,c(x(*)))(xW - wW), 

where K{m,c{x.^^^)) is cahed the neighborhood function, and a{t) is 
called the learning rate, which is usually a decreasing function of step 
t. One example of K{m,c{x^^^)) is the Gaussian kernel i^(m,c(x(*))) = 

exp{ — ||wm — w^(^(t)^ |p/2(T^}. A simpler choice is a so-called "bubble" 
function, that is, a uniform over the neighborhood (Voronoi tessellation) 
of w^(^(t)), zero elsewhere. 

Usually, the SOM training is performed in two phases. A relatively 
large initial learning rate is used in the first phase and a smaller learning 
rate is used in the second phase. This updating suggests that nodes close 
to the winner node, as well as the winner itself, update their vectors 
closer to x^*^ in the input data space. Vectors associated with far away 
output nodes do not change significantly. 

- Repeat the above steps until the nodes converge. (Convergence is vaguely 
defined and is usually taken as the default in the software.) 

• Iterative training (Version 2): 

- At step t, for each input vector Xj for i = 1, . . . ,n, compute the distance 
(e.g., Euclidean) between Xj and each of the node vectors Wm ■ Identify 
the winning node c(i) whose node vector is most similar to the input 
vector, that is, ||w^*^^j — < ||wm — Xj|| for m G {1, 2, . . . , M}. 

- Every node has its vector adjusted according to the following equation: 

^(t+l) _ Sj=l:n ^m.,c(i) (^)xi 

where ^m,c(j) (^) is the neighborhood function around the winning node 
c{i). One example is /im,c(i)(0 = a{t)K{m,c{i)){t), where, again, 
K{m, c{i)) is the Gaussian neighborhood kernel K{m, c{i)) = exp{ — ||wm — 
w^*^^^ |p/2(T^}. Here, a{t) is called the learning rate and is usually a de- 
creasing function of step t. 

Again, the SOM training is performed in two phases. Again, a rela- 
tively large initial learning rate is used in the first phase and a smaller 
learning rate is used in the second phase. In this updating, the contribu- 
tion (weight) of a particular training vector to each node only depends 
on the distance between the corresponding winning node of this training 
vector and each of the other nodes. /im,c(i) can be viewed as a smoothing 
function such that nodes close to the winner node as well as the winner 
itself update their vectors closer to the training vector in the input data 
space. 

- Repeat the above steps until the nodes converge. 
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• The final stage seeks to achieve visuahzation. When the number of nodes 
is large, visualization is most easily presented in two dimensions begin- 
ning with either a rectangular or hexagonal lattice of nodes. The iterative 
updating of the nodes eventually leads to a distortion of the lattice. (See 
Figure 4 and related discussion below.) An approach which is incorporated 
into the standard SOMs software (cited below) is the Sammon mapping 
scheme [Sammon (1969)]. Note that the goal here is "clustering" com- 
ponents of the vectors to achieve a two-dimensional representation, not 
clustering of the training vectors. In order to equalize the contributions 
of each of the components in the high-dimensional vectors with regard to 
classification, centering and scaling is recommended as pre-processing of 
the data. In our case this is certainly needed since the climate variables are 
measured over very different scales. Returning to the Sammon projection 
method, the basic idea is to arrange all the nodes on a 2-dimensional plane 
in such a way that the distances between the nodes in a 2-dimensional 
space resemble the distances in the original vector space as defined by 
some metric as faithfully as possible. Given the distance matrix D with 
element dij being the distance between node i and node j according to 
some metric (e.g., Euclidean distance), our goal is to find Om in for 
each node m for jn = 1, . . . , M to minimize an error function E defined by 
the following cost function: E = , J2i E-i>i ■ Note 

that the O's need a "center" to locate them. Also, the projections can be 
implemented at each iteration to see stability, hence convergence, as well 
as to assess interpretation. 

A software package for implementing SOMs is available 
(http://www.cis.hut.fi/research/som-research/). For more detailed 
explanation of the SOM procedure, see the references and guidelines at this 
publicly available software website. 

As a final comment on visualization, in our application below, with M 
small and data associated with spatial locations, we can create maps in 
geographic space, as proposed above. 

3. The dataset and some exploratory data analysis. The weather in a 
local region is conditional on the nature of the synoptic state of the atmo- 
sphere. Relating the synoptic scale characteristics to local scale responses 
requires the reduction of a large number of variables into a smaller set of 
data that still, in some sense, represent the original data. This goal moti- 
vated the use of the SOM technology. In this application we use daily mul- 
tivariate weather data over a specified time period to produce generalized 
weather circulations. These are then easily visualized as an array of archety- 
pal synoptic circulations that span the continuum of the data. In so doing. 
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daily synoptic atmospheric data are categorized into a prescribed number 
of archetypal synoptic (circulation) modes characteristic of a specified time 
period. South African weather systems have been categorized into six to 
eight main "types" of circulation [Tyson and Preston- Whyte (2000)]. On 
testing various SOM sizes, a 12-node SOM was selected which was deemed 
to adequately represent all the expected synoptic types. 

The SOM was trained using gridded (2.5° x 2.5°), daily mean atmospheric 
fields constructed from six-hourly National Center for Environmental Pre- 
diction/National Center for Atmospheric Research (NCEP/NCAR) global 
reanalysis data [see Kalnay et al. (1996)]. Data were extracted for a domain 
with 11 X 7 grid cells over southern Africa whose latitudinal and longitu- 
dinal extent (25°S to 40°S; 10°E to 34.5°E) captures synoptic circulation 
patterns from the sub-tropics to the mid-latitudes. The following 11 vari- 
ables were chosen as training data: mean sea level pressure, 500 hectopascal 
(hPa) geopotential height level, relative and specific humidities at the sur- 
face and at 700 hPa, daily maximum temperature at the surface, U- and 
V-wind components at the surface and at 700 hPa. Each of these variables 
was first standardized using the mean and the standard deviation of its cor- 
responding 11 X 7 cell time series. These standardized variables were then 
used to create an 847-element vector (11 x 11 x 7) which described the daily 
atmospheric state. The time period from 1970 to 2000 was used, which re- 
sulted in 11,315 daily records that were used to train the SOM. (The eight 
extra days in the included leap years were not included in the analysis for 
computational purposes, but these would not significantly alter the results.) 
Each of the 11 climate variables was standardized separately to preserve the 
local gradients in each field. 

The twelve resulting SOM nodes are labeled with their locations in two- 
dimensional space in Figure 1. This figure is intended to suggest that nodes 
near to each other are associated with somewhat similar synoptic states and 
that transition in SOM nodes is most likely to be to a neighboring node. 

To clarify the visualization, the SOM of sea level pressure (SLP) is pre- 
sented in Figure 2. It is used to assess the characteristic surface circulation 
associated with each node as it most clearly demonstrates the general synop- 
tic circulation as well as associated regional weather patterns. Elaborating 
further, we briefiy detail the features of the synoptic types captured by the 
12 SOM nodes. South Africa is a semi-arid environment and can very gen- 
erally be divided into summer and winter rainfall regions. Summer rainfall 
occurs over the interior of the country as a result of convective processes and 
winter rains over the south-western and southern coastal regions as a result 
of the passage of cold fronts. The SOM results show the majority (80%) of 
summer days map to nodes 5, 6, 8 and 9 and to a lesser degree nodes 3 
and 12. These nodes are associated with characteristic summer circulation 
features. On the right-hand side of the SOM, a sub-tropical low pressure 
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SOM Nodes in 2-dim 
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Fig. 1. Projected locations of SOM nodes in 2- dimensional space. 

system is situated over the northern part of the domain which bring rainfah 
to the interior of the country. In nodes 8, 9 and 12 a high pressure system is 
located at relatively high latitudes to the south of the domain which pushes 
frontal systems southward and results in dry summers over the western parts 
of the country and introduces moisture to the eastern and central parts. In 
node 5 a linkage between the tropical low and mid-latitude circulation forms 
a tropical-temperate trough which results in rainfall over a large part of the 
interior of the country. The majority of winter days (over 70%) map to nodes 
on the left-hand side of the SOM (nodes 1, 4, 7, 10). To the south of the 
country, these nodes are associated with the west-east progression of mid- 
latitude cyclones (cold fronts) across the south of the country which bring 
rainfall to the south and south-western parts of the country and very cold 
temperatures, especially over the interior. Over the interior of the country, 
the sub-tropical low has moved northward and is replaced by a high pressure 
system which dominates the circulation resulting in cold, dry conditions. In 
nodes 7 and 10 a high pressure system brings cold, polar air into the coun- 
try once the cold front has moved past. A typical winter synoptic sequence 
would be a progression from node 1 to node 4 to node 7 to node 10 over the 
period of about 2-3 days. Most spring days map to nodes 3, 10, 11, 12 and 
most autumn days map to nodes 1, 3 and 12. These nodes represent both 
summer and winter circulations expected in a transitional season. 
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Fig. 2. Sea level pressure (Pascals) associated with each node. 



Figure 3 is also presented in order to show a so-called 30 year climatol- 
ogy. It provides the departures, at grid cell level, from the 30 year mean 
(anomalies) of upper air specific humidity fields of each of the nodes. The 
anomaly map for specific humidity is more visually informative than the 
unadjusted map. Specific humidity in the upper atmosphere is used because 
it has been shown to be an important component for rainfall in the region 
in both summer and winter [Cavazos and Hewitson (2005)]. Over the inte- 
rior of the country in the nodes representative of summer circulations, the 
high negative anomalies demonstrate the high moisture content in the at- 
mosphere which (with other meteorological factors) result in wet summers. 
The high positive anomalies over the interior seen in the nodes associated 
with winter circulations indicate a lack of moisture in the atmosphere and 
dry winters. The opposite can be seen for the winter rainfall regions. 
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Fig. 3. Upper air specific humidity anomaly field associated with each nod 



Undertaking some preliminary exploratory data analysis, a first investi- 
gation is to explore the frequencies of occurrence of each node, hence of the 
synoptic climate systems. Table 1 (left) provides a histogram showing the 
frequencies of daily observations mapped to each node over the entire study 
period, from which we observe fairly evenly distributed percentage frequency 
of occurrence for each synoptic node and no particular archetype is found 
dominant over the study period. This is an anticipated result of the SOM 
algorithm as clarified in Section 2. As a crude look to infer temporal behav- 
ior of the synoptic climate states, Table 1 (right) compares the histogram 
of frequency of occurrence during 1970 to 1979 along with that during 1990 
to 1999. Some evidence of temporal shifting in the distribution of incidences 
over the study area is seen. For example, nodes 1, 2 and 3 occur more fre- 
quently during 1990 to 1999 compared with the 1970s. Table 2 (left) shows 
the frequency distribution of occurrence for each node in the summer season 
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Table 1 

(Left) Frequency of occurrence of each node over the entire study period, for example, 
935^*) indicates that the total number of occurrences of node 4 is 935; (right) Frequency 
of occurrence of each node during 1970-1979 and 1990-1999, for example, (278, 281)''^' 

indicates that the number of occurrence of node 5 during 1970-1979 is 278, and that 
number is 281 during 1990-1999 
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(December, January, February). It is clear that the chmate archetype which 
corresponds to nodes 5, 6, 8 and 9 dominates during the summer period. 
As shown in Table 2 (right), the dominant climate archetypes transfer to 
another type of circulation in winter (June, July, August). Now, we see high 
frequency of SOM nodes 1, 4, 7 and 10. We may also use SOM arrays to 
examine short term (e.g., daily) temporal evolution of synoptic events. The 
frequencies of daily transitions from each node to other nodes are calculated 
and shown in Table 3, which reveals a somewhat clockwise cyclic evolution 
(with regard to Figure 1) of the weather systems. For example, SOM node 
group 9 displays preferential transition to group 6, while SOM node group 
6 tends to most prefer transition to group 3. In Section 4 we elaborate this 
analysis by introducing formal time series modeling to interpret the SOM 
arrays. 

Table 2 

(Left) Frequency of occurrence of each node during summer; (right) Frequency of 
occurrence of each node during winter 
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Table 3 

Empirical transition probabilities. Each 4 by 3 sub-table in the following 4 by 3 array 
shows a set of transition probabilities. The array and sub-tables are arranged in the same 

way as in Table 1 
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4. SOM modeling. 

4.1. Dimension reduction. The daily climate reference data consist of an 
847 X 1 vector for each day within a 31-year period, which raises method- 
ological and computational challenges when we attempt to interpret them in 
high dimension. In fact, since the 847 x 1 vector arises as an 11-dimensional 
vector at each of 77 grid units, it is clear that we are monitoring a multivari- 
ate space-time data process. We do not seek to model this process directly, 
a very challenging task to develop and, likely, infeasible to fit. Rather, we 
seek to understand this process in terms of the SOM nodes that have been 
created. We will take advantage of the dimension reduction provided by the 
SOM procedure to, instead, model the induced collection of two-dimensional 
locations across time. As we remarked earlier, the SOM algorithm ignores 
time and space in creating the nodes. By introducing a space-time process 
model, we seek to enhance behavioral interpretation for the set of SOM 
nodes. The result of the SOM algorithm yields, in our case, twelve nodes, 
each with an associated two-dimensional location (Figure 1). We now seek 
to project the 11,315 daily state vectors onto this space of locations. Many 
schemes are available to accomplish this; there is no "best" one. We choose to 
map daily high-dimensional reference data onto a 2-dimensional surface us- 
ing high-dimensional pairwise distances along with the 2-dim coordinates of 
the SOM nodes. For each daily state vector, the Euclidean distances between 
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it and each of the nodes are calculated in high-dimensional space. Then, for 
each daily vector, nodes are ranked by their distances to the vector. We next 
introduce a greedy space searching technique that maps each daily vector 
onto a 2-dimensional surface. To be specific, a 2-dimensional point within 
a finite bounded region is selected as the projection if the ranks of the Eu- 
clidean distances in two dimensions to each node using this point agree with 
the ranks between the vector and the nodes in high-dimensional space. Evi- 
dently, there may be no point in 2-dimensional space which satisfies this con- 
dition, so we seek agreement in ranking starting from the smallest distance. 
Also, for a given high-dimensional point, such mapping may yield multi- 
ple mapped positions that provide the same extent of agreement in terms 
of rank distance agreement. In that situation, we averaged the coordinates 
of the multiple positions to ensure the uniqueness of the mapped position. 
Such an algorithm is easy to code and easily handles 11,315 points in 847- 
dimensional space. In Section 4.2 we focus on modeling the 2-dimensional 
coordinates derived using this method. Other projection approaches utiliz- 
ing alternative, possibly global, optimization criteria are certainly available, 
though they may be difficult to implement. However, the modeling approach 
we develop in Section 4.2 can be applied to the results of any projection 
strategy. 

4.2. Modeling specifications. The projection method described in Section 
4.1 is performed on the daily referenced data from 1970-2000 to yield 2- 
dimensional mapped coordinates which are plotted in Figure 4 along with the 
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Fig. 4. Mapped coordinates in the 2-dimensional space for each of the 11,315 days. 
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coordinates of the SOM nodes. We can see that the two-dimensional space 
is naturahy partitioned into 12 tessellations, each attached to a SOM node. 
Of course, Figure 4 gives no information regarding the temporal sequence 
of the points. 

However, let = (xtjUt)' denote the coordinates in 2-dims for day t, 
t = 1, . . . ,T. Before beginning the time series analysis, it is natural to ex- 
amine the autocorrelation in this bivariate time series. We ran a standard 
vector autoregression software package for lags 1 up to 50. The plot (not 
shown in the interest of space) finds an adjusted AIC value of 7.88 for the 
AR(1) model, 7.81 for the AR(2) model and reaches its minimum at 7.76 
for the AR(24) model. So, while there may be some evidence of longer range 
dependence in the series, the relative decrease in the model choice criterion 
is very small; AR(1) models may be good enough. Moreover, with interest 
in studying transition probabilities, in the sequel we work exclusively with 
AR(1) specifications. Under 12 nodes this still yields 144 transition proba- 
bilities. For the AR(2), we arrive at 1728 transition probabilities, too many 
to estimate well and to display. 

We start the analysis with a bivariate random walk Gaussian model 

st+i = st + et+i, 

where St follows a bivariate Gaussian distribution centered at with a 2 x 2 
covariance matrix S. Under this model, the conditional Gaussian probabil- 
ity density functions of the coordinates at each time step are completely 
determined by the coordinates at the previous time step along with the 
covariance matrices. For us, the bivariate random walk model plays the 
role of a straw man. If the SOM nodes effectively capture synoptic weather 
states, there should be some structure to the daily transitions in the s^'s. In 
other words, the algorithm yielding the SOM nodes is applied to spatially- 
referenced explanatory climate variables observed over time and, therefore, 
we would expect behavior with a more mechanistic description than purely 
random movement of the daily states in two dimensions. In this regard, de- 
note Y' = (s2, . . . , St), X' = (si, . . . , st_i). Then, the conditional maximum 
likelihood estimator (MLE) of S is 

S = (Y - X)'(Y - x)/(r - 1). 

A very general form of bivariate time series model is the following: 

(1) st+i = A{st,t)st + 'n{st,t) + et+i, 

where A(sf,t) is a 2 x 2 unknown matrix containing autoregression coeffi- 
cients that are allowed to vary in space and time, the values of which are 
specified by location and time at the preceding step. r]{st,t) enters (1) as 
an adjustment to the autoregressive component and can also be specified as 
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a function of the preceding location and time. Again, error terms €2, ■ ■ ■ ,st 
are independently identically distributed A^(0, 5]) representing unstructured 
noise or pure error in the model. The model in (1) provides a very flexible 
specification in the form of a locally affine transition model. In fact, it is also 
very challenging to fit. We are convinced that, even with more than 11,000 
days of data, the data cannot support or identify such a general model; we 
can not achieve a well-behaved MCMC fitting algorithm. Hence, we turn to 
some model simplifications. We begin with specifications on A{st,t): 

• The first is a constant transformation matrix model A(st,t) = A yielding 

(2) St+i = Ast + St+i. 

This is a simple case of vector autoregressive (VAR) models, which have 
been widely used in multiple time series analysis [see, e.g., Sims (1972) 
and Enders (2003)]. 

• We next consider a spatially-varying transformation matrix. It is most 
convenient to assign a distinct transformation matrix to each of the tes- 
sellations induced by the SOM nodes as described above. Let A/ be 
the transformation matrix when St G A;, where A; is tessellation I, for 
I = 1,2, . . . , L. Let Zi be a binary indicator, that is, Zi{st) = 1 if G A^, 
and otherwise. Then, 

L 

(3) Aist,t)=Y,AiZi{st). 

1=1 

This specification allows us to study regional change in the linear trans- 
formation. 

• One of the important questions we seek to address in our climate study is 
whether there is a change in circulation patterns. That is, we assume the 
same collection of synoptic states continues to operate over time. However, 
temporal change would be manifested by a change in incidence rates of the 
states and thus would be modeled using time varying transition matrices. 
Let Bm be the transformation matrix when t € Tm, where Tm are disjoint 
time blocks, that is, Um=i '^m = {l, • • • ,T}. Let Vm{t) = 1 if t G r^, and 
otherwise. Now, 

M 

(4) A{St,t)=Y.BmVm{t). 

m=l 

Expression (4) enables us to study temporally varying linear transforma- 
tion over suitable time scales, for example, months, quarters or years. 

• A spatially and temporally varying structure can be extended from the 
specifications described above in the form 

(5) A{St,t) = Y,^m,lZl{St)Vm{t). 

l,m 
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Special cases of (5) include separable forms in space and time, for example, 
; = B^A/ or Drn^i = A^Bm- The former provides spatial transition 
followed by temporal transition, the latter vice versa. 

The modeling in (3), (4) and (5) works at the aggregated spatial and 
temporal scale. Similarly, we could add spatially and temporally aggre- 
gated intercepts. In particular, we could introduce r]{st,t) = 77, r]{st,t) = 
T^iLiVi^iist), or T]{st,t) = Em=i^m^m(i)- Howcver, we view the role of 
the r]{st,tys as introducing point level refinement to aggregated level affine 
transformations. We do so by introducing a bivariate spatial Gaussian re- 
alization intended to provide spatially dependent adjustments to the linear 
transformation specification. The adjustment at time t is Ti(st), yielding the 
model 

(6) St+i=A{st,t)st + r]{st) + et+i. 

We propose a coregionalization model for the bivariate Gaussian process 
realization in the spirit of Gelfand et al. (2004). Let w(s) = {wi{s),'W2{s))' , 
where t(;i(s) and 1^2 (s) are uncorrelated spatial processes, each with zero 
mean and unit variance. Coregionalization constructs a bivariate spatial pro- 
cess by linear transformation of these two independent univariate processes, 
that is, ri{s) = (r7i(s),r/2(s))' = Q(?i;i(s), ■u;2(s))', where Q is a 2 x 2 unknown 
coregionalization matrix and can be taken as lower triangular without loss of 
generality, that is, Q = (^^" ^^^). An unusual aspect of the employment of 
this bivariate specification is t^at it provides a spatial surface to smooth all 
locations in the region while the observations are, in fact, a time series of lo- 
cations. In other words, this bivariate spatial process is created for observed 
locations at multiple time points rather than multiple locations observed at 
the same time. The process realization reflects the spatial variation unex- 
plained by the autoregressive component, regardless of the specific times of 
the observations. 

The model in (6) is now completely specified. However, recall that we 
work with 11,315 days, hence 11,315 locations in total. The joint distribu- 
tion of the collection of 11,315 t7(s)'s introduces an 11,315 x 11,315 covari- 
ance matrix. To handle this dimension, we employ a version of the predic- 
tive process model described in Banerjee et al. (2008). Briefly, we consider 
a set of "knots" S* = {sj,...,s^}, which forms a subset of the study re- 
gion in 2-dimensional space. The bivariate Gaussian process above would 
yield w* = [w{s*)]'^^ ~ MVN{0,C*{6)) as its realizations over S*, where 
C*{6) = [C{s* ,Sj;6)]l"'j^-^ is the corresponding m x m covariance matrix. 
The predictive process model is defined as 

(7) w{s)=E[w{s)\w*]=c^{s;e)C*-\e)w*, 

where c{s;6) = [C(s, s|; 0)]^^^^. In fact, w{s) is a Gaussian process with 
covariance function C{s,s';0) = c*^(s; 0)C*"^(0)c*(s', 0), where c*{s;0) = 
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[C{sq,s*;0)]Y=i- The realization of w{s) on any collection of sites is the in- 
terpolated predictions conditional upon the realization of w{s) over S* . To 
work with this process, we only need to work with wj, W2 and the associated 
pair of m X m correlation matrices. 

5. Model fitting issues. VAR models are well discussed in the literature 
[see, e.g., Liitkepohl (1993) and Zivot and Wang (2006)]. Analysis within 
the Bayesian paradigm is presented in, for example, Sims and Zha (1998), 
Sun and Ni (2004). We employ MCMC to fit the various submodels of (6) 
described in the previous section. In fact, we first discuss the computational 
issues in fitting the proposed models without spatial adjustment. Then we 
turn to issues in fitting the models with such adjustment. 

5.1. Fitting models without spatial adjustment. For those proposed mod- 
els without spatial adjustment, we illustrate the MCMC fitting procedure 
for model St+i = J2f=i A/Z/(sj)sf + St+i- 

Denote Z(sj) = (Zi(sj), . . . , ^^(st)), xj = (Z(sO®sO, Y' = (s2, . . . , sy), 
X' = (xi, . . . ,xt-i), e' = {e2, ■ . ■ , st), = {A[, . . • , A^). Here Y and e are 
(r — 1) X 2 matrices, $ is a 2L x 2 matrix of unknown transformation pa- 
rameters, xt is a 1 X 2L row vector and X is a (T — 1) x 2L matrix of 
observations. Then the VAR model can be written as 

Y = X* + e. 

The MLE's of $ and 5] are obtained by maximizing 

{'IP 
^ (s^+i - Xi*)5]-nst+i - Xi*)' 

= |^|(T-i)/2 '^tr|-i(Y - X*)S-HY - X*)'}, 

where etr(-) = exp(trace(-)). We obtain MLE's of * and S as 4a/ = (X'X)"^X 
and ±M = (Y - X*m)'(Y - X*M)/(r - 1). 

Bayesian model fitting is completed by assigning prior distributions on 
the unknown parameters of interest. Denote cf) = vec($), that is, the vector 
obtained by concatenating the entries in We assign cf) with a flat prior. 
We consider a noninformative Jeffreys prior for S, which, in our case, is 

Given S , we can directly sample cf) from its conditional distribution given 

by 

7r((/)|5],Y) ~My7V(vec(4M),S0(X'X)-^). 

Conditional on 0, Xl is updated using an Inverse Wishart ((T — 1)'S{^m), 
T-2L). 
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5.2. Fitting models with spatial adjustment. For the models with spatial 
adjustment, for convenience, we adopt an exponential correlation function 
for each of the two parent processes, hence bringing in two decay parameters 
9i and 62-^ A uniform prior is assigned for each of 61 and 62 and updated 
using Metropolis steps. For the coregionalization matrix Q, we assign trun- 
cated normal priors with positive support for the diagonal entries, and a nor- 
mal prior for the off-diagonal entry. The entries in Q are updated from nor- 
mal distributions conditional on the other parameters. Samples of and 

*(b) 

W2 are generated in blocks from their multivariate normal posterior distri- 
butions, which in turn yield samples of 7Z)(s)W = c^(s; 0(''))C*~^(0(''))w*W . 
Substantial gains in computational efficiency are achieved by working with 
W* at a relatively small number of knots. 

Each of the proposed models enables one day ahead prediction, that is, 
the posterior predictive distribution of location at time t + 1 given location, 
say, s at time t. This is implemented by composition; a posterior draw of 
the parameters in whatever version of (6) we fit, setting = s, enables a 
predictive draw for the location at time t + Posterior samples, s^'') enable 
a density estimate for the transition distribution at any time and given any 
location. In addition, these models enable inference about the "transition 
distance," Hs^+i — St||, in 2-dimensional space. 

In fact, again using posterior predictive samples, these models allow us to 
induce inference for categorical analysis at tessellation level. The probability 
of transitioning from s to tessellation / can be straightforwardly estimated 
as well as the 12 x 12 transition matrix from SOM node to SOM node 
(we omit details). This model-based estimate can be compared with the 
empirical estimate presented in Section 3 (Figure 3) . Evidently, we can learn 
about the movement of the daily state vectors at any spatial scale in 2- 
dimensions. Working at the scale of the tessellations enables us to inform 
about circulation among synoptic weather states defined by the SOM nodes. 

6. Model comparison and model results. Given the various possible model 
specifications detailed in Section 3, our first analysis goal would appear to 
be model comparison. We consider three criteria. First, we compare models 
in terms of one step ahead prediction performance. For each observation St 
at t, samples of s^+i are drawn from the posterior predictive distribution as 
described above. Posterior means are adopted as the point estimates of the 
predicted positions for each of t = 2 to t = T. The root mean square predic- 
tive errors {RMSPE) are computed to assess the predictive performance for 



^We would not anticipate sensitivity in the bivariate predicted rj surfaces to the choice 
of correlation function. 
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each model 



1 



T 



(8) 



RMSPE 



\ T-1 



i=2 



where is the mean of the posterior samples {s\ } for b = 1,2, . . . , B . 

A second comparison among models is to study the proportion of times 
the true locations lie in their associated 95% predictive interval under each 
model. This coverage proportion is denoted by r. A third model selection 
criterion which is easily calculated from the posterior samples is the deviance 
information criterion {DIC) [Spiegelhalter et al. (2002)]. DIC is a general- 
ization of the AIC and BIC criteria; smaller values of DIC correspond to 
preferred models. 

Table 4 summarizes the RMSPE, DIC score, and f for a collection of 
models as indicated. (When time is included it is either blocked quarterly or 
annually.) Based upon the model fitting to more than 11,000 data points and 
using a large number of posterior samples (10,000), we are comfortable with 
the number of significant digits provided. First, all of the proposed autore- 
gressive models are apparently superior to the random walk model in terms 
of predictive performance and DIC scores. Second, disappointingly, the mod- 
els including spatial adjustment show no evidence of improving performance 
on data fitting and prediction; there appears to be little spatial dependence 
left in the autoregression residuals. Further disappointment emerges in the 
similarity of performance of these models; we can do better than a random 
walk model, but can not find any interesting spatial or temporal structure. 

We offer several thoughts in this regard. Perhaps the projection to a 
two-dimensional space which yields our bivariate time series has removed 
the interesting structure. In particular, the spatially varying covariate in- 
formation associated with the 11 climate variables was used to create the 
projected locations in two dimensions; it is not available to explain the bi- 
variate time series. Moreover, the spatial dependence, that is, induced in the 
two-dimensional space may have little to do with the spatial structure in the 
geographic space of the original 11 space-time processes. Finally, climatolo- 
gists would assert that the SOM which was created is not intended for short 
term weather prediction; in capturing climate states, the SOM might be 
more appropriate for assessing regional climate change over a longer tempo- 
ral scale (see Section 6). So, while our modeling goal here was to learn about 
spatial and temporal structure in the created SOM (and, what follows below 
indicates that there is still a story to tell), to learn about the space time 
structure in the original daily data a different dimension-reduction strategy 
might be more appropriate. 
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Table 4 

Performance of several VAR models using root mean square predictive errors (RMSPE), 
the deviance information criterion (DIG) and the empirical coverage probability r. 
A{tessellation) denotes regionally varying A as described in (3), A{year) denotes 
annually varying A as described in (4), ri{quarter) denotes quarterly varying A which is 
constant as year within each quarter. ri{quarter*) denotes quarterly varying A which is 
also changing as year, A{tessellation, quarter) denotes A as described in (5) and 
ri{spatial) denotes spatially varying adjustment as described in (6) 





Model specification 


RMSPE 


Die (x 10^) 


r 


Model 


St+l 


= St + e 


11.419 


1.6403 


0.929 


Model 1 


St+l 


= Ast + e 


9.226 


1.5353 


0.944 


Model 2 


St+l 


= A{tessellation)st + e 


9.125 


1.5317 


0.943 


Model 3 


St+l 


— A{tessellation)st + r]{quarter) + e 


8.943 


1.5238 


0.939 


Model 4 


St+l 


— A{quarter)st + e 


9.215 


1.5337 


0.942 


Model 5 


St+l 


— A{quarter)st + rj + e 


9.205 


1.5337 


0.941 


Model 6 


St+l 


— A{quarter)st + ri{year) + e 


9.190 


1.5297 


0.947 


Model 7 


St+l 


— A{quarter*)st + e 


9.088 


1.5378 


0.948 


Model 8 


St+l 


— A{year)st + £ 


9.224 


1.5361 


0.944 


Model 9 


St+l 


— A{tessellation, year)st + e 


8.777 


1.5407 


0.956 


Model 10 


St+l 


— A{tessellation, quarter)st + e 


8.920 


1.5247 


0.940 


Model 11 


St+l 


— Ast + r]{spatial) + e 


9.214 


1.5421 


0.934 



In any event, Model 9, which has transformation matrix A specified as 
tessellation and year, has the lowest prediction error in terms of RMSPE?' 
In addition, the f for Model 9 is quite close to 0.95, as desired. As a result, 
we summarize results based on Model 9. Table 5 provides the posterior 
means for several parameters of interest and the corresponding 95% credible 
intervals under Model 9. We notice that estimates for the elements in S take 
large values, suggesting substantial unexplained variances in the SOM array. 
Again, this reflects our lack of covariate information but also comments upon 
the utility of SOMs for one-step ahead prediction of weather states. 

Following the discussion in Section 5, we provide some illustrations of 
the induced categorical analysis at the scale of the tessellations. Figure 5 
shows the histograms of transition distance for each of the 12 tessellations 
in the year 2000, which suggests possible regional heterogeneity in the dis- 
tribution of transition distance. In fact, synoptic weather in South Africa 
may display node-specific magnitude in volatility. For instance, SOM node 
group 2 which is expected in a transitional season, on average, tends to have 
higher transition distances than SOM node group 5 associated with char- 
acteristic summer circulation features. And this phenomenon might reveal 
high climate volatility associated with SOM node group 2 and relatively low 



^We might speculate that annual blocking captures an El Nino effect. 
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Fig. 5. Histograms of transition distances for each SOM group. 



climate volatility associated with SOM node group 5. Model 9 enables us 
to make inference of the transition matrices, as well as the corresponding 
estimated errors year by year. The estimated transition matrix is shown in 
Table 6 for the year 2000, which again supports the findings on the clockwise 
cyclic evolution of the weather systems. The trajectories of transition prob- 
abilities can be aligned in each row for 31 consecutive years, from which we 
can examine the temporal behavior of the transition probabilities in synop- 
tic climate states. As an illustration, Figure 6 plots the trajectories of three 
selected transition probabilities. For the general synoptic state associated 
with SOM node 1, the persistence probability reached above 0.35 in 1986 
and then dropped below 0.15 in the subsequent year. The transition prob- 
ability from SOM node group 11 to SOM node group 12 roughly remained 



Table 5 

Posterior means and 95% credible intervals of A.teaaeUationi,year2om o,nd S 





Mean 


0.025% 


0.975% 


SE 


an 


0.604 


0.151 


1.033 


0.237 


ai2 


0.168 


-0.220 


0.618 


0.226 


a2i 


-0.199 


-0.739 


0.409 


0.286 


0,22 


-0.028 


-0.678 


0.503 


0.289 


Ell 


40.394 


39.212 


41.457 


0.566 


El2 


17.572 


16.544 


18.648 


0.531 


E22 


68.441 


66.429 


70.297 


0.961 
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at a stationary level during the 1970s, then dramatically fell below 0.15 in 
1983. It appears to be evolving with a more volatile trajectory since the late 
1980's, finally reaching a peak which is above 0.35 in 2000. 

7. Discussion. The use of SOMs has made considerable inroads in the 
meteorology community with regard to developing synoptic weather states 
to describe the collection of available weather patterns across a region. Since 
the SOM technology represents high-dimensional vectors in two-dimensional 
space, we considered vector AR models to try to better understand the 
temporal evolution of SOM nodes. We have demonstrated that, while these 
SOMs may adequately span the high-dimensional space of daily weather 
data vectors, they reveal little interesting spatial or temporal structure with 
regard to forecasting weather states. 

Finally, we offer a potentially useful remark. The inability of the SOM to 
predict short term temporal evolution of these states does not imply that the 
SOM will not be useful for the projection of future climate. If we assume that 
the SOM nodes describe regional weather well and that the same weather 
states continue to operate in the future, we may be able to forecast climate 
change in the form of a less uniform incidence of the different states than 
we currently see (Tables 1 and 2). 



Estimated mean of the transition matrix in 2000. Each 4 by 3 sub-table in the following 4 
by 3 array shows a set of transition probabilities. The array and sub-tables are arranged 

in the same way as in Table 1 



Table 6 



0.144 
0.081 
0.059 
0.030 

0.177 
0.161 
0.148 
0.103 

0.139 
0.132 
0.133 
0.107 

0.187 
0.197 
0.203 
0.168 



0.169 
0.100 
0.074 
0.037 

0.108 
0.087 
0.073 
0.049 

0.106 
0.095 
0.092 
0.076 

0.064 
0.055 
0.050 
0.042 



0.150 
0.081 
0.048 
0.026 

0.040 
0.025 
0.016 
0.012 

0.041 
0.031 
0.024 
0.023 

0.013 
0.008 
0.006 
0.005 



0.042 
0.017 
0.011 
0.004 

0.027 
0.031 
0.035 
0.029 

0.057 
0.055 
0.057 
0.043 

0.089 
0.116 
0.145 
0.165 



0.205 
0.060 
0.035 
0.011 

0.085 
0.100 
0.107 
0.077 

0.128 
0.119 
0.115 
0.081 

0.082 
0.091 
0.097 
0.091 



0.458 
0.104 
0.040 
0.013 

0.159 
0.146 
0.119 
0.085 

0.130 
0.094 
0.070 
0.052 

0.041 
0.033 
0.026 
0.024 



0.003 
0.003 
0.005 
0.004 

0.008 
0.012 
0.017 
0.019 

0.008 
0.013 
0.019 
0.024 

0.014 
0.028 
0.049 
0.085 



0.022 
0.036 
0.048 
0.041 

0.035 
0.063 
0.091 
0.094 

0.027 
0.053 
0.086 
0.124 

0.024 
0.054 
0.103 
0.227 



0.162 
0.231 
0.249 
0.194 

0.106 
0.161 
0.194 
0.200 

0.060 
0.112 
0.177 
0.297 

0.025 
0.053 
0.098 
0.240 
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Fig. 6. Transition probabilities from 1970-2000. 
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